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ABSTRACT 
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In the Modern-Era Retrospective analysis for Research and Applications, 
version 2 (MERRA-2) system the land is forced by replacing the model- 
generated precipitation with observed precipitation before it reaches the sur- 
face. This approach is motivated by the expectation that the resultant improve- 
ments in soil moisture will lead to improved land surface latent heating (LH). 
Here we assess aspects of the MERRA-2 land surface energy budget and 2 m 
air temperatures (T7”). For global land annual averages, MERRA-2 appears 
to overestimate the LH (by 5 Wm~? ), the sensible heating (by 6 Wm~7), 
and the downwelling shortwave radiation (by 14 Wm~*), while underestimat- 
ing the downwelling and upwelling (absolute) longwave radiation (by 10-15 
Wm? each). These results differ only slightly from those for NASA’s previ- 
ous reanalysis, MERRA. Comparison to various gridded reference data sets 
over Boreal summer (June-July-August) suggests that MERRA-2 has particu- 
larly large positive biases (>20 Wm~*) where LH is energy-limited, and that 
these biases are associated with evaporative fraction biases rather than radi- 
ation biases. For time series of monthly means during Boreal summer, the 
globally averaged anomaly correlations (Rgnom) with reference data were im- 
proved from MERRA to MERRA-?, for LH (from 0.39 to 0.48 vs. GLEAM 
data) and the daily maximum T™ (from 0.69 to 0.75 vs. CRU data). In re- 
gions where T~” is particularly sensitive to the precipitation corrections (in- 
cluding the central US, the Sahel, and parts of south Asia), the changes in 
the T2” Ranom are relatively large, suggesting that the observed precipitation 


influenced the 77” performance. 
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1. Introduction 


The NASA Global Modeling and Assimilation Office recently released the Modern-Era Ret- 
rospective analysis for Research and Applications, version 2 (MERRA-2; Gelaro and Coauthors 
(2017)). This new global reanalysis product replaces and extends the original MERRA atmo- 
spheric reanalysis (Rienecker et al. 2011), as well as the MERRA-Land reanalysis (Reichle et al. 
2011). In addition to several other major advances, MERRA-? uses observed precipitation in place 
of model-generated precipitation at the land surface during the atmospheric model integration. The 
use of observed precipitation in MERRA-? was refined from the approach used for MERRA-Land 
(Reichle et al. 2017b), which was an offline (land only) replay of MERRA forced by atmospheric 
fields from MERRA but with the precipitation forcing corrected using gauge-based observations. 

The motivation for using observed precipitation in reanalyses is that precipitation is the main 
driver of soil moisture, which in turn controls the partitioning of incident surface radiation between 
latent heat (LH) and sensible heat (SH) fluxes back to the atmosphere. Reichle et al. (2017a) 
show that both MERRA-2 and MERRA-Land have improved upon the land surface hydrology of 
MERRA, showing better agreement with independent observational time series of soil moisture, 
terrestrial water storage, stream flow, and snow amount. Here, we extend this work, by evaluating 
the MERRA-2 surface energy budget and 2 m temperatures (T~”) over land. In particular, we 
focus on whether the improved hydrology in both the (offline) MERRA-Land and the (coupled 
land/atmosphere) MERRA-?2 data sets translates into the expected improvements to the monthly 
mean LH and SH. We also expand previous work by evaluating the reanalyses land surface output 
globally, rather than focusing on locations with high quality ground-based observations. 

We start by comparing the long-term annual global energy budget over land from MERRA-?2, 


MERRA-Land, and MERRA to state of the art estimates from the literature. These literature 
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estimates, from Trenberth et al. (2009), Wild et al. (2015), and the NASA Energy and Water Cycle 
Studies program (NEWS,NSIT (2007); L’Ecuyer et al. (2015)) were each produced by carefully 
combining multiple input data sets with global energy balance constraints. Taken together they 
represent our best understanding of the long-term annual mean energy budget over land. 

Next, we consider global maps of the performance of the land surface turbulent heat fluxes from 
each reanalyses, as a step towards linking differences in performance to the dominant local physi- 
cal processes and to the potential improvements obtained from the use of the observed precipitation 
in MERRA-2. We focus on the Boreal summer (June-July-August; JJA), since land/atmosphere 
coupling is strongest and surface turbulent heat fluxes are most active in the summer. 

Unfortunately, there are no standard global gridded reference data sets against which the reanal- 
ysis LH and SH can be evaluated. Several recent efforts have compared global LH estimates from 
different combinations of reanalyses, offline land surface models, and diagnostic methods. Most 
estimates generally agree on the regional patterns and local seasonal cycle of LH, although there 
is considerable disagreement in the absolute values and temporal behavior across different flux 
estimates (Jiménez et al. 2011; Mueller et al. 2011; Miralles et al. 2011). Additionally, uncer- 
tainty in the basic model structure is the largest source of disagreement (Schlosser and Gao 2010; 
Mueller et al. 2013). While ground-based observations are available from tower-mounted eddy 
covariance sensors (e.g., Baldocchi and Coauthors (2001)), the number of towers (in the 100’s) 
is well below the sampling needed for global estimation (and their locations are not designed to 
sample globally-representative land cover types). Additionally, the measurements themselves have 
considerable uncertainty and limited spatial representativeness (up to 1 km). 

In the absence of a standard reference, we compare the JJA reanalysis turbulent heat flux esti- 
mates to two different gridded reference data sets: Global Land surface Evaporation: the Ams- 


terdam Methodology (GLEAM) (Miralles et al. 2011; Martens et al. 2017) for LH, and Fluxnet- 
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Model Tree Ensembles (MTE) (Jung et al. 2010) for LH and SH. These data sets were selected for 
several reasons: i) they are amongst the state of the art, i1) they are available globally for multi- 
decadal time periods, iii) they are independent of each other, and iv) they rely on very different 
estimation methodologies (water balance modeling for GLEAM, and upscaling of tower measure- 
ments for MTE). Since neither GLEAM nor MTE represents direct observations of the turbulent 
heat fluxes, we also compare each reanalysis to tower-based eddy covariance observations from 
the Fluxnet-2015 data set (Fluxnet 2015). To determine the potential contribution of radiation bi- 
ases to regional LH and SH biases, we also compare the reanalyses surface radiation fields for JJA 
against gridded observations from the Clouds and the Earth’s Radiant Energy System (CERES) 
and Energy Balanced and Filled (EBAF) data set (Kato et al. 2013). 

Finally, to test whether the changes in the surface energy budget from MERRA to MERRA- 
2 have affected the atmospheric boundary layer, we also evaluate the JJA monthly mean daily 
minimum and maximum T~” against observations from the Climatic Research Unit (CRU) at the 
University of East Anglia (Harris et al. 2014). Improvements in MERRA-2 due to the use of 
observed precipitation cannot be isolated from the many other advances distinguishing MERRA- 
2 from MERRA. Consequently, we establish whether the improvements in the surface turbulent 
fluxes and T7” are at least consistent with the expected improvements from the use of observed 
precipitation, by cross-referencing the evaluation results against the regional sensitivity to precip- 
itation and/or soil moisture. 

This paper is organized as follows. Section 2 summarizes the reanalysis and reference data sets 
used, and Section 3 presents the results, including evaluation of the i) reanalyses annual global 
land energy budget averages, ii) the spatially distributed mean JJA energy budget and T””, and ii) 


the temporal behavior of the JJA turbulent heat fluxes and T7”. We also identify regions of sensi- 
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tivity to the observed precipitation forcing in MERRA-2, for cross-reference against the evaluation 


results. Our findings are summarized in Section 4. 


2. Methodology and data 


a. The reanalyses 


The coverage and resolution of each reanalysis is summarized in Table 1, with further details 
below. MERRA (Rienecker et al. 2011) and MERRA-2 (Gelaro and Coauthors 2017) are atmo- 
spheric reanalyses produced with the NASA Goddard Earth Observing System Version 5 (GEOS- 
5) modeling and data assimilation system, and were designed to provide historical analyses of the 
hydrological cycle across a broad range of climate time scales. To address shortcomings in the 
MERRA land surface hydrology, MERRA-Land (Reichle et al. 2011) was released as an offline 
(land only) replay of MERRA, with the model-generated precipitation corrected using rain-gauge 
observations and with minor, but important, model parameter changes. MERRA-2 features sev- 
eral major advances from MERRA, including an updated atmospheric general circulation model, 
an updated atmospheric assimilation system, an interactive aerosol scheme, and the use of ob- 
served precipitation at the land surface (and to compute wet aerosol deposition). In addition to 
the land model updates from MERRA-Land, MERRA-? includes several more updates relevant to 
the land, as outlined in Reichle et al. (2017a). Most notably, the surface turbulence scheme was 
revised, generally resulting in enhanced SH over land (Molod et al. 2015). 

The method used to apply the observed precipitation at the land surface in MERRA-2 was refined 
from that used in MERRA-Land (Reichle and Liu 2014; Reichle et al. 2017b). In MERRA-Land 
the precipitation was corrected with daily Climate Prediction Center (CPC) Unified (CPCU; Chen 


et al. (2008)) precipitation observations everywhere. For MERRA-2 the input precipitation differs 


127 


128 


129 


130 


131 


132 


133 


134 


135 


136 


137 


138 


139 


140 


141 


142 


143 


144 


145 


146 


147 


148 


in two ways: 1) in the high latitudes the MERRA-2 model-generated precipitation is retained, and 
ii) over Africa the MERRA-? precipitation is corrected with pentad-scale blended satellite and 
gauge-based observations from the CPC Merged Analysis of Precipitation (CMAP; Xie and Arkin 
(1997)) and the Global Precipitation Climatology Project (GPCP; Huffman et al. (2009)) version 
2A 

The land surface turbulent fluxes from the NASA reanalyses (MERRA-2, MERRA-Land, and 
MERRA) have not been explicitly evaluated globally. However, Jiménez et al. (2011) and Mueller 
et al. (2011) both included MERRA LH when merging multiple LH global land data sets into a 
single enhanced estimate (see Section 2.b), and in both studies MERRA was amongst the high- 
est of the input LH estimates used. Additionally, Jiménez et al. (2011) noted a sharp gradient 
in the MERRA LH around 10°S in the tropics that was not present in other LH estimates. This 
bias gradient was traced to MERRA’s excessive rainfall canopy interception and precipitation er- 
rors (Reichle et al. 2011). Consequently, the interception reservoir parameters were revised for 
MERRA-Land (and MERRA-2) to eliminate this feature (the interception reservoir update was 
the most significant modeling change from MERRA to MERRA-Land). 

An additional reanalysis, ERA-Interim, from the European Centre for Medium Range Weather 
Forecasting (Dee et al. 2011), is included in the evaluation of the temporal behavior of the turbulent 
fluxes. In contrast to the NASA reanalyses, ERA-Interim includes a land surface updating scheme 
(de Rosnay et al. 2014). Specifically, the soil moisture, soil temperature, and snow temperatures 
are updated to minimize errors in the forecast screen-level relative humidity and temperature, 
while the snow depths are updated using satellite- and ground-based snow cover and snow depth 


observations. 
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b. Annual global land energy budget estimates 


We compare the reanalyses annual global land energy budgets to three state of the art estimates, 
from Trenberth et al. (2009), Wild et al. (2015), and the NEWS program estimates of L’Ecuyer 
et al. (2015). Each of these is based on a weighted merger of multiple modeled and observed data 
sets, and each applies to the energy budget at the start of the 21st Century. For Trenberth et al. 
(2009) we have used their estimates for the ‘CERES period’ of 2000-2004; Wild et al. (2015) 
nominally refers to the same period; while L’Ecuyer et al. (2015) nominally refers to 2000-2009. 
Note that the MERRA LH and SH over land were used as one of the inputs in NEWS. 

These three global energy budget studies all provide continental and oceanic energy estimates, 
where ‘continental’ is defined as non-ocean, and so includes land, land-ice, and lakes, but excludes 
inland seas. By contrast, the land estimates from MERRA-2, MERRA-Land, and MERRA apply 
to the area modeled by the land surface model, excluding land-ice, lakes, and inland seas. The 
discrepancy due to the inclusion or exclusion of land-ice is significant: land-ice accounts for 10% 
of the continental area, with Antarctica making up 95% of this. NEWS provides energy bud- 
gets for each continent separately (L’ Ecuyer et al. 2015), and we use their (balance-constrained) 
energy budget estimates to approximate the land-only energy budget terms by subtracting the area- 
weighted Antarctica estimates from the global continental estimates. We then use our land-only 
NEWS estimates to approximate the continental to land ratio for each NEWS energy budget term. 
By assuming that the same ratios apply to Trenberth et al. (2009) and Wild et al. (2015) we then 
approximate land-only estimates for the latter two studies. L’Ecuyer et al. (2015) and Wild et al. 
(2015) both provide uncertainty ranges for their globally averaged continental estimates, which 


we have applied unchanged to our approximated land-only estimates. 
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For LH, we have also used three additional global land annual average estimates from the hy- 
drology community, from Jiménez et al. (2011), Mueller et al. (2011), and Mueller et al. (2013). 
These estimates are also based on merging modeled and observed estimates. Jiménez et al. (2011) 
applies to global land (using a similar land definition to the NASA reanalyses) for 1994, while 
Mueller et al. (2011) applies to the global land area, excluding the Sahara, from 1989-1995, and 
Mueller et al. (2013) applies to the global land plus Greenland for 1989-2005. As previously noted, 
MERRA LH was one of the inputs used in the multi-product mergers of Jiménez et al. (2011) and 


Mueller et al. (2011). 


c. Gridded reference data sets 


The coverage and resolution of each gridded reference data set, together with a brief summary 
of important interdependencies with other data sets or reanalyses used in the study and uncertainty 


estimates (where available) are summarized in Table 2, with further details provided below. 


1) GLEAM 


GLEAM (version 3.1a) provides daily estimates of terrestrial evapotranspiration, estimated from 
satellite and reanalysis forcing using a Priestley and Taylor-based model (Miralles et al. 2011; 
Martens et al. 2017). The precipitation is from the Multi-Source Weighted-Ensemble Precipitation, 
which is a multi-model merger of established precipitation data sets, including the same CPCU 
data set used in MERRA-Land and MERRA-2, as well as ERA-Interim precipitation (the latter is 
used predominantly in the high latitudes, where observed precipitation data sets are more uncertain 
(Beck et al. 2017)). The net surface radiation and T2™ are from ERA-Interim. Compared to 


independent observations from 91 flux towers, GLEAM has an average unbiased root mean square 
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error (ubRMSE; or error standard deviation) of 20 Wm~? and an average anomaly correlation of 


0.42 (Martens et al. 2017). 


2) MTE 


MTE provides global estimates of carbon dioxide, energy, and water fluxes at the land surface, 
calculated using a machine learning technique to upscale half-hourly energy balance-corrected 
eddy covariance observations from 253 Fluxnet tower observations (Jung et al. 2011). The input 
Fluxnet observations are from the La Thuile data release, an earlier generation of the Fluxnet- 
2015 data set used here (to be introduced in Section 2.d). CPCU precipitation (again, used directly 
in MERRA-Land and MERRA-2) and a T2” data set based on CRU data (Jung et al. 2011) are 
used as predictive (regression) variables in the MTE. However, this meteorological data has little 
impact on the MTE monthly anomalies, which are instead driven by the vegetation variability 
as observed by the fraction of absorbed Photosynthetically Active Radiation (fPAR; Jung et al. 
(2010)). When 20% of the Fluxnet training data was withheld from the algorithm, the average 
Root Mean Square Error (RMSE) with the withheld data was 15 Wm’, for both LH and SH, and 
the average anomaly correlation was 0.57 for LH and 0.60 for SH (Jung et al. 2011). In general, 
the MTE method is better suited to estimating spatial variability and the seasonal cycle than it is 


to capturing interannual anomaly patterns (Jung et al. 2009). 


3) CRU TEMPERATURE DATA 


CRU TSv4.00 provides gridded monthly means of the daily mean, minimum, and maximum 
temperature over land (Harris et al. 2014; University of East Anglia Climate Research Unit et 
al. 2014). The temperatures are calculated from quality controlled climate station data, which are 


interpolated onto the grid according to an assumed correlation decay distance (set to 1200 km for 
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temperature variables). In instances where no station data are available within the assumed decay 
distance, the published data value defaults to the climatology. Here, such climatological values 
have been screened out. Also, we require at least 10 data points to estimate each statistic for a 
given grid cell. Even with this screening, the gridded output will be much less certain when/where 
station coverage is less dense, which occurs over Africa, South America, central Australia, and the 


high latitudes. 


4) CERES-EBAF RADIATION DATA 


CERES-EBAF version 4.00 surface radiances are produced with a radiative transfer model af- 
ter adjusting modeled and observed input data for consistency with Top of Atmosphere (TOA) 
CERES-EBAF radiation (Kato et al. 2013). The input data (surface, cloud, and atmospheric prop- 
erties) are adjusted according to their observation-based estimated uncertainties. The input temper- 
ature and humidity profiles and land surface skin temperature (7,;;,) are from NASA’s GEOS-5.4.1 
modeling and assimilation system, the same system (although a different version) used in MERRA 
and MERRA-?2. 

The CERES output shortwave irradiances are primarily determined by (observation-based) TOA 
radiation and clouds, hence they are reasonably independent of the MERRA and MERRA-? re- 
analyses (Kato et al. 2013). On the other hand, the CERES output longwave irradiances, and 
particularly the upwelling longwave (LW,,), are strongly dependent on the GEOS-S5 Ty,;jn input. 
However, the CERES algorithm does adjust its input GEOS-S T;;;, with observation-based cloud 
information, so comparison between the CERES-EBAF and GEOS-5 LW, partly reflects these 
observation-based adjustments, even though the two fields are not independent. Compared to in- 
dependent ground-based observations from 24 sites over land, the RMSE of the CERES-EBAF 


radiation is 12 Wm~* for downwelling shortwave (SW,), and 10 Wm~* for downwelling long- 
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wave (LW,) (CERES-EBAF 2017). For the regional estimates over land, CERES-EBAF (2017) 
estimated the uncertainty to be 12 Wm~? for SWy, 4 Wm? for upwelling shortwave (SW,), 10 


Wm for LW,, and 18 Wm~? for LW,,. 


5) GRIDDED DATA SET PROCESSING 


As noted in Tables 1 and 2 some of the reference data sets and reanalyses used here publish 
output that applies only to the land fraction within each grid cell, while others publish a single 
estimate that applies to all surface types (land, permanent land-ice, lakes, ocean) within each grid 
cell. All of the gridded data sets and reanalyses were screened by removing all grid cells where 
the MERRA-? land fraction was less than 50% (after interpolation to the relevant resolution), and 
then aggregated up to monthly means and 1° spatial resolution. All maps of global statistics are 
based on the Boreal summer months of JJA only, and each comparison is made over the maximum 
available co-incident time period, with the time periods noted in the relevant figure captions. The 
anomaly correlations (Ranom) are evaluated based on anomalies from the mean seasonal cycle 
(calculated by subtracting the time period mean separately for each calendar month). The gridded 
reference data sets were also used to estimate the annual global land average values, for which the 


(interpolated) MERRA-2 land area in each grid cell was used. 


d. Fluxnet-2015 tower observations 


The Fluxnet-2015 (Fluxnet 2015) sites were selected by downloading all Tier 1 observations at 
non-irrigated sites within grid cells classified as land at 1° resolution (as derived above in Section 
2.c.5), and for which at least a 10 year data record is available. Eddy covariance sensors underesti- 
mate turbulent heat fluxes and do not generally close the energy balance (Wilson et al. 2002), hence 


we used the Fluxnet-2015 energy balance closure-corrected LH and SH (see Fluxnet (2015) for 
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details of the correction method). While these corrections are rather uncertain, the corrected LH 
and SH showed better agreement with all of the reanalyses in Table 1 in terms of the means across 
all sites and the correlation of the means between the sites (while having negligible impact on the 
mean time series anomaly correlations). The balance-corrected Fluxnet data were then screened 
to retain only days with less than 10% gap-filled data, and only sites with data for at least 2550 
days (~ 70% of 10 years). The monthly means were then calculated for months with at least 15 
days of observations after the above screening, and the corresponding reanalysis monthly means 
were estimated using the same days. The resulting Fluxnet monthly time series were visually in- 
spected, and obviously unrealistic features were removed. Four sites with unrealistic time series 
were removed. Of the remaining 21 stations, just one was in the Southern Hemisphere. Since our 
evaluation focuses on the Boreal summertime, this site was excluded. The remaining 20 sites that 


have been used in this study are listed in supplemental Table 1. 


3. Results 


a. Annual global land energy budgets 


The globally averaged annual land energy budget estimates for MERRA-2, MERRA-Land, and 
MERRA are illustrated in Figure 1, with numerical values given in Table 3. For each term, the 
estimates for MERRA-2 and MERRA are similar (within 2-3 Wm~7), while the partitioning of 
Rnet into LH and SH differs for MERRA-Land, which is shifted towards greater SH. Compared to 
MERRA, MERRA-Land has 11 Wm~? more SH, and 8 Wm? less LH, with the difference in Rye: 
due to decreased LW, (recall that in the offline MERRA-Land SW,,.; and LW, are taken directly 


from MERRA). 
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Figure | also includes the energy budget estimates from the literature (see Section 2.b), as well 
as the annual global land averages for each of the gridded reference data sets in Table 2. In Figure 
la, the MERRA-2 and MERRA global land LH are higher than all of the other estimates (although 
MERRA-2 is within the Jiménez et al. (2011) and Wild et al. (2015) confidence intervals). The 
three (land-adjusted) LH estimates from the global energy budget studies (Trenberth et al. (2009), 
Wild et al. (2015), and NEWS) are very similar to each other, and to MTE, GLEAM, Mueller et al. 
(2011), and MERRA-Land (all are within 1 Wm~). While the other two LH estimates from the 
hydrology community (Jiménez et al. (2011) and Mueller et al. (2013)) are higher, they are not as 
high as MERRA-2 and MERRA. Compared to the average of the three global land energy budget 
estimates, the MERRA-2 LH is biased high by 6 Wm~? (15%), while MERRA is biased high by 
9 Wm-? (21%), and MERRA-Land is much closer, being biased high by just 1 Wm? (2%). 

For the global land SH in Figure 1b, MERRA-2 and MERRA are both higher than Trenberth 
et al. (2009) and Wild et al. (2015), although lower than NEWS (but within the NEWS confidence 
interval) and very close (within 1 Wm~*) to MTE. Compared to the average of the three global 
land energy budget estimates, MERRA-2 is biased high by 5 Wm~* (15%) and MERRA by 4 
Wm? (12%), while MERRA-Land is much higher, with a bias of 15 Wm~? (42%). 

The positive biases in both LH and SH from the reanalyses indicate a positive bias in the incident 
energy at the land surface. Indeed, Figure 1g shows that the reanalyses Ry exceed the three 
global energy budget estimates, although MERRA-?2 (the lowest of the reanalyses) is only slightly 
higher (2 Wm~7) than the CERES-EBAF value. Compared to the average of the three global 
energy budget estimates, the Rye biases are 12 Wm? (15%) for MERRA-2, 13 Wm-2 (17%) for 
MERRA, and 16 Wm? (21%) for MERRA-Land. Figures |c-f show that the positive Rye; bias in 
MERRA-2 and MERRA is made up of a large positive bias in SW, combined with insufficient LW,,, 


both partly offset by underestimated LW,. For SWz (Figure 1c) MERRA-2 and MERRA are higher 
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than all three global land energy budget estimates and CERES-EBAF, with a bias compared to the 
the three-product average of 14 Wm? (7%) for MERRA-2 and 16 Wm~? (8%) for MERRA. 
For SW, (Figure 1d), MERRA-2 and MERRA are both above NEWS, Trenberth et al. (2009), 
and CERES-EBAE, but below Wild et al. (2015) (although within the confidence interval). Both 
are biased high by 3 Wm? (8%), compared to the three-product average. For LW, (Figure le), 
MERRA-2 and MERRA are lower than the of the other estimates, with biases of -11 Wm? (- 
3%) for MERRA-2 and -10 Wm? (-3%) for MERRA against the three-product average. For 
IW, (Figure 1f) MERRA-2, MERRA-Land, and MERRA are again lower than the other plotted 
estimates, with biases of -11 Wm~* (-3%) for MERRA-2, -13 Wm~? (-3%) for MERRA-Land, 
and -10 Wm? (-3%) for MERRA. 

The literature estimates in Figure 1 are presented as long term means, and each represents dif- 
ferent temporal and spatial coverage. Likewise, the annual global land averages for the gridded 
reference data sets in Figure | are based on the full available (spatial and temporal) coverage for 
each. However, the gridded reference data sets and reanalyses can be cross-screened to ensure that 
they are compared with consistent coverage. With this cross-screening, the MERRA-2 LH bias 
estimate is 7 Wm? vs. GLEAM, or 9 Wm~2vs. MTE, while the SH bias is 1 Wm~? vs. MTE, 
and the radiation biases vs. CERES-EBAF are 10 Wm? for SW,,, 2 Wm~2 for SW, -18 Wm? for 
LW,, -11Wm~? for LW,,, and <0.5 Wm~? for Rner. In general, the above-quoted biases (calculated 
after cross-screening) are all close (within 1 Wm’) to the values estimated from the data plotted 
in Figure 1 (which does not include cross-screening), with the exception of the LH bias vs. MTE, 
which is 6 Wm~* without cross-screening (compared to 9 Wm~7). This discrepancy is due to the 
MITE global mean being lower than it otherwise would be, due to the lack of coverage over the 


Sahara (which has near-zero annual mean LH). 
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b. Land-atmosphere coupling and the MERRA-2 precipitation corrections 


Here, we identify regions where, in MERRA-2, i) LH is sensitive to precipitation (or soil mois- 
ture), and ii) the daily maximum T7” ie) is sensitive to the applied precipitation corrections. 
These regions can then be used to determine where the change in performance from MERRA to 
MERRA-2 is most likely associated with the precipitation corrections. Note that for part i1) above, 
the diurnal temperature range could be expected to have a stronger signal of the daytime turbulent 
heat fluxes (Betts et al. 2017), however a preliminary comparison (not shown) revealed similar re- 
sults for DTR and 7,2”, and we have presented the results for 7,2". since this variables is included 


max? max 


in the published MERRA-? data sets. 


1) SOIL MOISTURE AND LATENT HEATING 


To first order, LH (or evapotranspiration) from soil and vegetation surfaces can be conceptu- 
alized as either a moisture- or energy-limited process. In drier conditions (i.e., for soil moisture 
below some critical point), LH is moisture-limited in that it is restricted by the amount of soil 
moisture available for evapotranspiration. Temporal variations in LH will then be correlated with 
the plant available soil moisture (principally, the soil moisture in the root-zone). In contrast, in 
more humid conditions LH is energy limited; there is sufficient soil moisture available for evap- 
otranspiration, so LH proceeds at the maximum rate determined by atmospheric water demand, 
and temporal variations in LH are accordingly correlated with temporal variations in atmospheric 
demand (net radiation, atmospheric humidity deficit, and wind), rather than soil moisture. 

Figure 2 shows the squared correlation between the JJA monthly anomaly MERRA-2 LH and 
rootzone soil moisture (R2.,4,(LH,SM)). Lower R?,,o_(LH,SM) indicates a tendency towards 


energy-limited LH, which for the Boreal summer occurs in the high latitudes, central and eastern 


Europe, the eastern US, south China, and much of the tropics (the Amazon, equatorial Africa, and 
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southeast Asia). On the other hand, higher R2,,,,(LH, SM) indicates a tendency towards moisture- 


anom 


limited LH, and occurs across the remainder of the low and mid-latitudes. While we have plotted 
JJA to focus on the Boreal summer, there are still regions of moisture-limited LH in the southern 
hemisphere during Austral winter, specifically in arid regions (southern Africa, much of Australia, 


and the desert and steppe regions of South America). 


2) PRECIPITATION FEEDBACK ON AIR TEMPERATURE 


2 


Figure 3 shows maps of the squared anomaly correlation (Ro,om 


) between anomaly timeseries 


of JIA MERRA-2 monthly 72”. and anomaly timeseries of 2-month (current + previous month) 


averaged MERRA-2 precipitation. For example, the June 72!” is compared to the (May+June) 


max 


precipitation, while the July 7,2” is compared to the (June+July) precipitation, and so on. The 


precipitation is lagged like this to allow the precipitation signal to accumulate in the soil, and influ- 


ence the subsequent 72”. In Figure 3a the MERRA-2 model-generated precipitation (PRECTOT) 


max’ 


is used, while in Figure 3b the MERRA-2 observation-corrected precipitation (PRECTOTCORR) 


is used. The R2 


“nom ate plotted only for negative R values, since the dominant local relationship be- 


tween precipitation and daytime temperature is negative (1.e., under moisture-limited conditions, 
reduced precipitation leads to reduced soil moisture, which limits LH and increases SH and T~”). 
Figure 3b reflects the modeled relationship in MERRA-?2 between precipitation falling on the sur- 


face and 72” Even with the difference in time periods, the patterns are similar to those found 
p p 


max’ 


across the contiguous U.S. from observations by Koster et al. (2015). 


Figure 3c then shows the difference between R-,,,,(7,2/",PRECTOTCORR 


anom ( max? ) and 


R2.) (T2™_ PRECTOT 


Ee Wl Botan ). This difference (AR2,,.,) is the increase in the fraction of vari- 


a 


ance in 7,2”. explained by the (observed) precipitation seen by the land (PRECTOTCORR) over 


max 


that explained by the model-generated precipitation (PRECTOT). It thus provides a measure of 
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the local impact of the observed precipitation on the MERRA-2 T2”. This measure is sensitive 
to both the magnitude of the precipitation corrections and the local response of the atmospheric 
model to those corrections. Note that the lack of sensitivity in the high latitudes was inevitable for 
this metric, since the model-generated precipitation is used there. 

For the Boreal summer, the strongest impact of the observed precipitation, which can explain 
more than 25% of the 7,7”. variance, is indicated in the central US, central America, the northern 
tip of South America, across a broad swath along the Sahel, and parts of south Asia. Note that 
these regions do not directly correspond to the regions of strongest moisture-limited LH in Figure 
2, for at least two reasons. First, a strong sensitivity of evapotranspiration to soil moisture (Figure 
2) does not imply that the soil moisture variations are locally strong enough to induce large evap- 
otranspiration variations and thus large impacts on air temperature (Figure 3c). Second, as noted 
previously, the plotted sensitivity also includes a signal of the size of the precipitation corrections, 
and so will be enhanced where the differences between the model-generated and observation- 
corrected precipitation are larger. 

Figure 3c is consistent with previous studies identifying hot-spots of strong coupling between 
the land and T2”. In particular Koster et al. (2006) and Miralles et al. (2012) both identify similar 
regions of strong coupling centered on the central US/central America and the Sahel, although 
they do not agree as well over south Asia. Over South Asia Koster et al. (2006) does not locate a 
hotspot, while Miralles et al. (2012) identifies India as having the strongest coupling, and Figure 
3c suggests patchy regions of coverage spanning from southeast Asia through the north of India. 


For reference, the corresponding maps for the Austral summer (December-January-February) 


2 


are shown in supplemental Figure 1 for R7,,,,, 


(LH,SM) and supplemental Figure 2 for the sensi- 
tivity to the precipitation corrections. In supplemental Figure 1, the R2.,,,,(LH,SM) over Austral 


anom 


summer again shows the expected pattern of moisture-limited LH in drier areas of the summer 


19 


hemisphere (almost everywhere, outside of the tropics). As with the Boreal summer, regions of 
moisture-limitation LH extend into the winter Hemisphere. However, the effect of reduced radia- 
tion close to the poles is now evident in the switch to energy-limited LH, even in arid regions that 
are poleward of around 50° (such as central Asia). Supplemental Figure 2 shows strong sensitivity 
of 72" to the precipitation corrections across nearly all of the southern Hemisphere, including the 
Amazon and tropical Africa. Since these latter two areas typically have saturated soils, this strong 
signal is unlikely due to the precipitation-soil moisture pathway, and is perhaps due to sensitiv- 


ity of evaporative cooling from the canopy interception to changes in precipitation supply to the 


interception reservoir. 


c. Biases over Boreal summer 


In Section 3.a, the biases in the reanalyses’ global land energy budgets were provided as annual 
means. The seasonal cycle of the monthly mean global land biases (not shown) reveal that the 
largest global land biases for all budget terms occur in the Boreal summer (JJA). Below, maps of 
these JJA biases are presented and discussed, together with the corresponding biases in 2 m air 


temperatures. 


1) ENERGY BUDGET TERMS 


Figure 4 shows maps of the reanalyses’ JJA biases in LH and SH compared to each of GLEAM 
and MTE. For LH, the regions of positive and negative biases relative to GLEAM or MTE are 


similar (compare the first and second columns of Figure 4). For both, the LH biases depend on the 


2 


local LH regime, with energy-limited regions (low Room 


(LH ,SM) in Figure 2) generally having 
larger positive LH biases (> 20 Wm~°; e.g., for MERRA-2 in Figures 4d,e across the tropics, south 


Asia, and the northern high latitudes), while moisture-limited regions (high R2,,,(LH,SM) in 


anom 


20 


440 


Figure 2) tend to have smaller biases (magnitude <10Wm*). Consequently, the spatial correlation 


2 
between Ro,.6m 


(LH, SM) (as plotted in Figure 2) and the MERRA-2 LH biases is -0.65 for GLEAM 
and -0.73 for MTE. 

The MERRA LH biases (Figures 4j,k) show some of the same features as for MERRA-2, again 
with a tendency for large positive biases in energy-limited LH regimes. The most prominent 
difference is the sharp bias gradient in MERRA around 10°S (most notable in South America). 
As discussed in Section 2.b, this is associated with the unrealistically large rainfall interception 
reservoir in MERRA, combined with the MERRA precipitation errors; these problems have been 
alleviated in MERRA-2 (and MERRA-Land). Additionally, there are some isolated regions of 
large positive biases in moisture-limited regimes in MERRA that are removed in MERRA-?2 (and 
MERRA-Land), such as in Mexico and south India. 

Overall, in energy-limited regions (R2,,4,(LH,SM) <0.5 in Figure 2) the area-averaged LH bias 
in MERRA-2 (25.5 Wm~* compared to GLEAM, 29.9 Wm~* compared to MTE) was slightly 
higher than for MERRA (24.1 Wm? compared to GLEAM, 27.6 Wm~? compared to MTE), both 
of which are much higher than for MERRA-Land (11.3 Wm compared to GLEAM, and 7.6 
Wm~* compared to MTE). In contrast, in moisture-limited LH regions (R?,,9_(LH,SM) >0.5 in 
Figure 2), the area-averaged LH bias is highest in MERRA (7.0 Wm~? compared to GLEAM, 
5.2Wm? compared to MTB), and reduced in MERRA-? (3.8 Wm? compared to GLEAM, 1.5 
Wm? compared to MTE), and even further reduced in MERRA-Land (0.3 Wm? compared to 
GLEAM, -2.9 Wm~* compared to MTE). 

The third column of Figure 4 shows the reanalyses biases in SH compared to MTE. In gen- 
eral, the SH biases for each reanalyses have an inverse relationship with the LH biases in the 


first two columns (for MERRA-2, the spatial correlation between the SH biases and the LH bi- 


ases is -0.68 for GLEAM LH and -0.78 for MTE LH). Consequently, the evaporative fraction 


PA 
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(EF=LH/(LH+SH)) biases compared to MTE in the first column of Figure 5 show a spatial pattern 
very similar to that of the LH biases (for MERRA-2, the spatial correlation between MTE LH and 
EF biases is 0.83). 

The sum of LH and SH approximates the net incoming radiation (after neglecting the ground heat 
flux and temporal change in 7;;;,). The second and third columns of Figure 5 show, respectively, 
the biases in the reanalyses LH+SH sum compared to MTE and the biases in their Rye compared 
to CERES-EBAF. There is a weak agreement between the Rye; biases suggested by MTE and 
CERES-EBAF (for MERRA-2, the spatial correlation is 0.46). Comparison to MTE (Figures 5, 
second column) suggests that the reanalyses net surface radiation tends to be overestimated, with 
the largest biases (>30 Wm?) occurring over the Amazon, the horn of Arica, and the Tibetan 
Plateau. While comparison to CERES-EBAF (Figure 5, third column) also suggests relatively 
large positive biases over the Tibetan Plateau and the horn of Africa, these positive biases are 
smaller in both magnitude and regional extent than was suggested by MTE. Additionally, CERES- 
EBAF also indicates strong negative biases (<-30 Wm) over the Sahel and the southeast US, 
particularly in MERRA-Land (Figure 51) and MERRA (Figure 51). Finally, inter-comparing the 
Rner biases for each reanalyses shows qualitatively that the broad patterns are similar in MERRA-2 
and MERRA (also MERRA-Land), although MERRA has a tendency towards larger (positive and 
negative) biases. 

There is no obvious correspondence between the regional biases in the LH (compared to 
GLEAM or MTE) and the regional biases in Rye (compared to either MTE LH+SH or CERES- 
EBAP). For example, the spatial correlations are less than 0.1 between the MERRA-2 LH bias 
(implied by comparison to GLEAM or MTE), and the MERRA-2 LH+SH bias (implied by MTE). 
Likewise, the spatial correlations are again less than 0.1 between the MERRA-2 LH bias (implied 


by GLEAM of MTE) and the MERRA-2 R,,.; bias (implied by CERES-EBAF). This suggests then 
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that the pattern of regional biases in the reanalyses LH for JJA (compared to either GLEAM or 
MTE) are associated with differences in the partitioning of incoming radiation into LH and SH, 
rather than with differences in the surface radiation (compared to MTE or CERES-EBAP) itself. 

While radiation biases do not appear to be the main predictor of LH biases, biased radiation will 
results in biased LH and/or SH. Hence, we have partitioned the JJA Rye; bias between MERRA-2 
and CERES-EBAF into the individual contributions from each radiation term. Figure 6 shows the 
JJA biases between MERRA-2 and CERES-EBAF for the SW,.;, LW,z, and LW,,. In terms of the 
direction of the biases, the broad patterns of regional biases in the radiation terms are unchanged 
from MERRA (not shown). The direction of the regional Rye biases for MERRA-2 in Figure 5f 
largely mirror the regional SW,,; biases in Figure 6d (spatial correlation: 0.75), the main exception 
being over the southeast US. The LW biases are somewhat balanced, in that both are negative 
across most of the domain, with the LW, bias in Figure 6e typically being slightly more negative 
than the LW,, bias in Figure 6f. Both have relatively large negative biases (magnitude > 30 Wm~7) 
in northern hemisphere desert regions, and smaller (magnitude: 10-20 Wm) negative biases 
elsewhere. The spatial distribution of the SW,,.; biases mirrors that of the downwelling shortwave 
(SW, not shown), indicating that the SW,,-; biases are primarily driven by SW, differences rather 
than differences in the surface albedo used in CERES-EBAF and GEOS-5. The above patterns 
of overestimated SW, (or SWz) and underestimated LW, across much of the globe are consistent 
with a known tendency for the GEOS-5 systems to underestimate mid-latitude continental cloud 
cover (Molod et al. 2012; Wang and Dickinson 2013; Gelaro and Coauthors 2015). 

The LW, is calculated from the 7,;;,, and the negative biases in MERRA-? (and also MERRA 
and MERRA-Land) indicate a cool bias in the model Tyyjn. At 285 K, a LW, bias of 10 Wm~? is 
roughly equivalent to a 7;;;, bias of 2 K. Recall that the CERES-EBAF LW, is not independent of 


the MERRA suite of reanalyses, due to its use of GEOS-5 T;,in. However, the input GEOS-5 Tyxin 
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is adjusted within the CERES-EBAF algorithm to constrain the TOA irradiance, so comparison 
of GEOS-5 and CERES LW, indicates the adjustment required to the GEOS-5 7);;, to balance the 
TOA fluxes. Previous work has also suggested that the GEOS-5 7\;;, is underestimated, partic- 
ularly in dry regions. For example, in agreement with our Figure 6f, Draper et al. (2015) found 
large cool biases in the GEOS-5 73x; over desert regions in summer (their Fig. 5), compared to 
remotely sensed observations. As argued in Draper et al. (2015), this GEOS-5 T5x;) cool bias is, 
at least in part, caused by the model’s 7;4;, definition differing from that of a true skin layer from 
which LW,, is emitted (or as is observed in the thermal infrared). 

In summary, the pattern of regional LH biases in the reanalyses suggested by GLEAM and MTE 
are very similar. This result adds confidence to the use of GLEAM and MTE for estimating re- 
gional biases in the reanalyses. As with the annual global land averages in Figure 1, the maps 
presented here suggest that MERRA-2 and MERRA (but not MERRA-Land) have a general ten- 
dency to overestimate LH. If the GLEAM, MTE, and CERES-EBAF regional means are assumed 
to be more accurate than the reanalyses, the above comparisons suggest that in energy-limited 
regions, MERRA-2 (and MERRA) overestimate LH due to an overestimated evaporative fraction 
(i.e., too much incoming radiation is converted to LH rather than SH). There is little change in the 
global average biases from MERRA to MERRA-2. However, there are some isolated regions in 
Mexico and south Asia that are typified by moisture-limited LH, where MERRA has positive LH 
biases associated with overestimated EF, while MERRA-2 and MERRA-Land have much smaller 
biases. The precipitation corrections in MERRA-2 (and MERRA-Land) removed a relatively large 
amount of precipitation across these locations (Reichle et al. (2017b); their Figure 3b), strongly 


suggesting that the use of precipitation observations in these products reduced the LH biases. 
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2) AIR TEMPERATURE 


The biases in the MERRA-2 and MERRA JJA monthly mean daily minimum, daily maximum, 
and diurnal range in T2", relative to the CRU data set, are shown in Figure 7 (Ts is not calculated 
by the land-only MERRA-Land system). For the daily minimum T~” GC in the left column, 
both reanalyses tend towards positive (warm) biases, particularly MERRA. For the daily maximum 
T°" (T2) in the center column, MERRA-2 tends towards cool biases, with patches of warm 
biases across central Asia and the Arabian Peninsula (investigation of the large positive bias in 
the Arabian Peninsula suggests it is associated with an error in the CRU reference data, rather 
than the reanalyses). For MERRA, these patches of positive bias are expanded to cover most of 
the desert region in the northern hemisphere, and also much of the southern hemisphere. For 
the diurnal temperature range (DTR) in the third column, the MERRA-2 biases inherit the broad 
spatial pattern of the 72". biases, while for MERRA some of the large positive 7,2!" biases are 
offset in the DTR by co-located positive To 

The LH and SH biases in Figures 4 and the DTR biases in Figure 7 show some of the ex- 
pected regional similarities. In particular, in the high latitudes and the Amazon MERRA-2 has 
relatively large positive LH biases (and negative SH biases) and relatively large negative DTR bi- 
ases. MERRA also has overestimated LH and underestimated DTR in the same regions, as well 
as in southeast Asia and central America. This is consistent with an underestimated DTR caused 
by underestimated SH (and overestimated LH), particularly given that the Rye bias is generally 
neutral in these regions in Figure 5. It should however be noted that the high latitudes and the 
Amazon regions are both data-scarce, and both the reanalyses and reference data sets are less well 


constrained. In other regions there is less correspondence. For example the western US also has 


underestimated DTR for MERRA and MERRA-2, while neither GLEAM nor MTE suggests over- 
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estimated LH. Over all, the spatial correlations between the LH biases and DTR biases are rather 
low (for MERRA-2, they are -0.38 for GLEAM and -0.47 for MTE). 

Recall that in Section 3.c.1 above, the CERES-EBAF comparison suggested that the MERRA-2 
(and MERRA) Tigin 1s generally biased cool, with larger cool biases in desert areas. However, a 
comparison of the LW,, biases in Figure 6f to the re and 7,” biases in Figures 7d,e shows little 
correspondence between them, and in particular the regions of relatively large cool Tyxjn biases 
(underestimated LW, ) in the northern hemisphere deserts do not have cool biases in either 7,’" 


max 


and 72". This apparent contradiction between the temperature biases suggested by comparison 
to the CERES-EBAF LW, (~ Tyxin) and the CRU T?"does not necessarily imply that one of these 


data sets is incorrect, given the likelihood mentioned above that the model 734; biases are at least 


partly associated with the model definition of Ty;in. 


d. Turbulent heat flux anomaly correlations over Boreal summer 


Here the monthly mean turbulent heat flux time series are evaluated over Boreal summer based 
on their temporal correlations (Ranom) with the reference data sets. Figure 8 shows maps of the 
JJA Ranom for each of the NASA reanalyses (MERRA-2, MERRA-Land, and MERRA) and ERA- 
Interim, with the Rgnom calculated separately vs. each of the GLEAM and MTE turbulent heat 
fluxes. For LH, the regional patterns in the Ranom vs. either GLEAM (Figure 8, first column) or 
MITE (Figure 8, second column) show some similar features (for MERRA-?2, spatial correlation 
between Figures 8a and 8b: 0.69). Comparison to Figure 2 again suggests some dependence on 
the LH regime. In the Northern Hemisphere, the LH Ranom is generally highest (~ 0.6) in regions 
where LH is moisture-limited, and generally much lower (<0.2) where LH is energy-limited. The 


two exceptions are the high latitudes, which have high LH Rgnom and energy-limited LH, and the 
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Sahara, which has low LH Rgnom and is moisture-limited (although LH variability in the Sahara is 
very low, making the signal susceptible to noise). 

The Ranom patterns for ERA-Interim in the final row of Figure 8 provide some additional context 
for evaluating the NASA reanalyses. The LH Ranom values are generally higher for ERA-Interim 
than for the NASA reanalyses. As for MERRA-2, the ERA-Interim Rgnom vs MTE is relatively 
low in many energy-limited LH regimes (including the eastern US, tropics, and south Asia), while 
the Ranom for ERA-Interim vs. GLEAM is more spatially consistent, in contrast to the Ranom for 
MERRA-2. The relatively high Ranom between GLEAM and ERA-Interim LH in energy-limited 
LH regimes may well be due to GLEAM having used ERA-Interim radiation and temperature, 
since it is in these regions that these fields will have the strongest influence on the LH. On the 
other hand, the lower Ranom between the NASA reanalyses and the LH reference data sets (and 
also between ERA-Interim and MTE) could be attributed to errors in both the reference data sets 
and the reanalyses under energy-limited conditions. For MTE, this result was expected because 
MTE is thought to be more reliable in estimating temporal variability in moisture limited areas, 
since its temporal variability is largely driven by fPAR (Jung et al. 2010). 

Moving on to SH, the third column of Figure 8 shows the Rgnom vs. MTE for each reanalysis. 
The regional patterns are similar to those for LH, with higher Rgnom ( >0.5) in moisture-limited 
LH regions, and lower (< 0.2) values elsewhere. ERA-Interim Ranom vs. MTE is generally higher 
than the NASA reanalyses, with values greater than 0.5 across most of the globe (and particularly 
in the Northern Hemisphere). Despite the improved LH from MERRA-Land, the SH Ranom vs. 
MTE is lower than for MERRA (or MERRA-2). 

Globally averaged, the rank order of the mean LH Ranom, while rather low, is the same vs. ei- 
ther GLEAM or MTE and follows the expected progression of improvement from MERRA, to 


MERRA-Land, and then to MERRA-2. GLEAM suggests a larger improvement, from a globally 


2d 


averaged Ranom of 0.39 for MERRA to 0.48 for MERRA-2, with MERRA-Land falling in be- 
tween (0.45). MTE suggests an improvement from 0.29 for MERRA to 0.34 for MERRA-?2, with 
MERRA-Land again falling in between (0.32). For SH, the globally averaged Ranom vs. MTE is 
similar for MERRA (0.36) and MERRA-2 (0.37), but is much lower for MERRA-Land (0.28). For 
ERA-Interim, the global mean Ranom for LH is ~0.1 higher than for MERRA-2 (0.60 vs. GLEAM, 
and 0.44 vs. MTE) and ~0.2 higher for SH (0.46 vs. MTE). The better agreement between ERA- 
Interim and the reference data sets could be a consequence of the land surface updates applied in 
ERA-Interim, which indirectly targets the turbulent heat fluxes. (Although recall that the relatively 
strong agreement between the GLEAM and ERA-Interim LH will partly reflect their dependence; 


see Section 2.c.2). 


e. Comparison to Fluxnet tower data 


Since the reference data sets used above do not represent direct observations, we now com- 
pare the globally-averaged LH and SH statistics from Section 3.a (for the annual mean turbulent 
heat fluxes over land), and Section 3.d (for the mean JJA Ranom) to statistics calculated against 
Fluxnet-2015 tower observations. Figure 9 shows the annual mean of the turbulent fluxes aver- 
aged across the 20 tower sites for the Fluxnet (eddy-covariance) measurements themselves and for 
each reanalysis and reference data set averaged across the 20 Fluxnet locations, with the global 
land annual means (from Figure 1) included for reference. For LH, comparison to the Fluxnet 
observations agrees with the results from the global land comparison in Section 3.a, again sug- 
gesting that the MERRA-2 LH is biased high, although the Fluxnet observations suggest a larger 
bias (of 12 Wm~?, or 30%) than was suggested by the global comparison (estimated as 6 Wm~ 
in Section 3.a). Averaged across the 20 Fluxnet sites, the MTE LH is very close to the Fluxnet 


data (within 0.5 Wm~7), while GLEAM is slightly higher. For the interested reader, supplemental 
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Figure 3 shows scatterplots comparing the MERRA-2 and reference data set LH annual means at 
the 20 individual sites. 

For SH, the Fluxnet observations agree less well with the global land comparison. First, the 
annual mean of the Fluxnet data is about 10 Wm~? below the global mean estimates from the 
other reference data sets. For each of the global reference data sets and reanalyses, the annual 
average over the 20 Fluxnet sites is also 15-20 Wm~* lower than the global average, suggesting 
that the relatively low Fluxnet annual mean is associated with the spatial sampling of the Fluxnet 
sites. Second, averaged across the Fluxnet sites, the Fluxnet mean SH is close to that of MERRA- 
Land, and above that of MERRA-2 (by 6 Wm~2, 18 %). In contrast, for the global averages in 
Section 3.a the reference data sets were all close to MERRA-2 (and MERRA), with MERRA-Land 
standing out as being biased high. 

Figure 10 shows the JJA Ranom averaged over the 20 Fluxnet sites for each reanalyses vs. each of 
Fluxnet, GLEAM, and MTE, with the global average JJA Rgnom from Section 3.d also included for 
GLEAM and MTE. The Rgnom for the Fluxnet data are quite low, which is somewhat expected due 
to the mismatch in spatial representation between the tower-based observations and the reanalysis. 
Nonetheless, the Fluxnet Ranom (as well as the GLEAM and MTE Rano at the same locations) 
indicates similar relative reanalysis performance as the global mean Rgnom. In particular, for LH 
MERRA-2 and MERRA-Land outperform MERRA, as also indicated by the global means. How- 
ever, the one discrepancy is that the Rgnom vs. the Fluxnet data is similar for ERA-Interim and 
MERRA-2, while the global comparisons (and also the GLEAM and MTE data averaged across 
the Fluxnet sites) all suggest that ERA-Interim outperforms MERRA-2 (giving mean Ranom around 
0.1 higher). For SH, the rank order between the average JJA Ranom is the same from the Fluxnet 


data than from the global reference data sets, with the MERRA-Land Rgnom again being lower than 
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that for MERRA (and MERRA-?2), and the ERA-Interim average Ranom being higher than that for 
MERRA-2. 

It is notable that over the Fluxnet tower sites, both GLEAM and MTE have higher average Ranom 
with the reanalyses than the Fluxnet observations do. In particular, MTE was trained on an earlier 
generation of the Fluxnet data, and the higher mean Rgnom vs. MTE than vs. Fluxnet suggests that 
the MTE algorithm has added coarse-scale information (similar quality control was applied here 
as was applied to the tower observations used in MTE). For the interested reader, supplemental 
Figure 4 shows scatterplots of the MERRA-2 LH Ranom vs. each reference data set at the 20 
individual sites. 

Note that for Fluxnet, the Rgnom for (LH+SH), plotted in Figure 10c is consistently about 0.1 
higher than the Ranom for either LH or SH separately. Decker et al. (2012) obtained a similar 
result for the correlation between reanalyses and tower observations. This indicates that the eddy 
covariance measurements and the reanalyses have a stronger agreement in the implied incoming 
radiation than in the partitioning of that radiation into LH and SH (this result is unchanged if the 
Ranom ate calculated from the Fluxnet data that have not been energy balance-corrected ). This 
could be a signal of errors in the partitioning within the reanalyses, or perhaps just as likely, 
this difference is associated with the spatial representation of the tower observations, since the 


incoming radiation is more spatially homogeneous than either LH or SH on its own. 


ese jf. Precipitation Corrections and Air Temperature Performance 


Finally, we seek to establish whether the precipitation corrections in MERRA-2 influenced the 


local 7,2". We do this by comparing the performance of the MERRA-2 and MERRA T2”. to 


max* max 


Figure 3c, which shows the MERRA-2 sensitivity to observed precipitation. Figure 11 shows the 


T2™. Ranom VS. CRU observations over JJA for MERRA-2 and MERRA. In general, the MERRA- 


max 
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2 Ranom 18 high (> 0.7) across most of the domain, particularly in the high latitudes, with much 
lower (< 0.4) values across much of the tropics and parts of South America, Africa, and south 
Asia. Note that the latter regions all have relatively sparsely distributed CRU station data, which is 
likely contributing to the lower agreement with the reanalyses. Compared to MERRA, the greatest 


improvements in the MERRA-2 | ele Ranom occurred in the eastern US, much of tropical South 


America and Africa, the Sahel, and parts of south Asia and China. There are also several regions 


where the y eesike Ranom 18 reduced, including northern South America, and much of southeast Asia. 


Overall, the global averaged Te Ranom VS. CRU was increased from 0.69 for MERRA to 0.75 for 
MERRA-2. 


Comparing Figure 11c to Figure 3c, the regions with the strongest sensitivity of T7!". to the 


precipitation corrections generally have relatively large changes in the 72”. Ranom (including the 


Sahel, parts of south Asia, and central America). Consequently, where the metric in Figure 
3c is above 0.25 (1.e., the observation-corrected precipitation explains at least 25% more of the 


MERRA-2 72" variance than the model-generated precipitation does), the area-averaged absolute 


change in the Rgnom 1s 0.15, compared to an area-average absolute change of 0.07 elsewhere. This 


tendency toward relatively large change in the Te Ranom Where i ei is sensitive to the precipita- 


tion corrections suggests that the observed precipitation in MERRA-? contributed to the change in 


T2™ performance. Additionally, the change in 72”. Ranom in these regions is generally, although 


not always, positive (giving an area averaged change in the Ranom of 0.06 where the metric in Fig- 
ure 3c is greater than 0.25). In some of the instances where the | Me Ranom is degraded, this can be 
traced back to errors in the precipitation observation data sets input into MERRA-2. For example, 
over Myanmar, the y Meni Ranom 18 decreased by more than 0.15, likely due to persistent local errors 


in the precipitation observations input into MERRA-2 (Reichle et al. 2017b). Finally, there are 


also regions with large changes in the 72”. Ranom Outside of the regions of 7,7’". sensitivity to pre- 
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cipitation (the eastern US, tropical Africa and South America, and central China). The i all Ranom 


is increased in MERRA-2 across most of these regions, likely due to other advances (beyond the 


use of observed precipitation) in the MERRA-2 modeling and assimilation system. 


4. Summary and conclusions 


The land surface energy budgets of three reanalyses from NASA (MERRA, MERRA-Land, and 
MERRA-2) are compared here to the best available estimates from the literature and to (largely) 
independent global reference data sets. In terms of the global land annual averages, the results sug- 
gest that the MERRA-2 LH and SH are biased high by 5 Wm~? and 6 Wm”, respectively, while 
SW,, has a large positive bias of 14 Wm~7, SW, is biased high by 3 Wm~*, and the upwelling and 
downwelling LW components are biased low, by 11 Wm~? and 13 Wm, respectively. Compared 
to MERRA, this is a slight (~ 2 Wm’) reduction in the LH and SW,,-; biases, while the difference 
is even smaller for the LW terms (~ 1 Wm~?). The radiation biases are associated with known 
issues in the GEOS-5 models used in the reanalyses, specifically a tendency to underestimate mid- 
latitude continental clouds (Wang and Dickinson 2013) and a cool bias in the model 7;4;,, (Draper 
et al. 2015). 

Compared to reference flux estimates from GLEAM and MTE over the Boreal summer (when 
both the fluxes themselves and their biases are greatest), the largest MERRA-2 LH biases (>20 
Wm?, vs. either GLEAM or MTE) occur in regions where LH is energy-limited, such as in the 
high latitudes, the tropics, parts of south Asia, and the eastern US. The MERRA-2 LH biases are 
typically smaller in regions where LH is moisture-limited, which include the drier regions of the 
mid and low latitudes. In some of these moisture-limited regions (parts of south Asia and Mexico) 
the high bias in the MERRA LH was largely removed in MERRA-2 (and MERRA-Land), likely 


because the observed precipitation used in the latter was lower than that produced by the MERRA 
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(or MERRA-2) modeling systems. Finally, comparison to the evaporative fraction from MTE and 
to Rner from CERES-EBAF or as inferred from MTE LH+SH indicates that the regional biases in 
the reanalyses LH are generally associated with differences in the partitioning of R,.; into LH and 
SH rather than with differences in the radiation input. 

The temporal agreement between the reanalyses and the reference data sets over Boreal summer 
was measured using the monthly anomaly correlation (Rgnom) over JJA. For LH, the Ranom between 
the reanalyses and the reference data sets (GLEAM and MTE) again showed some dependency 
on the LH regime, with a tendency towards better agreement where LH is moisture-limited than 
where it is energy-limited. The lower agreement in energy-limited regions does not necessar- 
ily imply poorer performance in the reanalyses, as it may be due to errors in the reference data 
sets. The globally averaged Ranom values show the expected improvement in skill with each new 
NASA reanalyses. For example, MERRA-? has slightly better globally averaged LH Ranom (0.48 
vs GLEAM) than MERRA-Land (0.45), which is substantially better than MERRA (0.39). The 
Ranom Was also calculated for the monthly mean daily T2™ vs. CRU reference data over JJA. Av- 


max 


eraged over global land, the JJA Te Ranom VS. CRU increased from 0.69 for MERRA to 0.75 
for MERRA-2. The results presented above for the regional biases and Rgnom were based on the 
Boreal summer, however the same analysis has been performed over the Austral summer (not 
shown), yielding qualitatively similar results. 

The use of observed precipitation in MERRA-2 was motivated by the hope that the subsequent 
improvements in simulated soil moisture would lead to the improved partitioning of incoming 
radiation between latent and sensible heating, ultimately leading to improvements in the diurnal 
evolution of the boundary layer. It is difficult, however, to unequivocally attribute the improve- 


ments in MERRA-2 to the use of observed precipitation because MERRA-2 includes many other 


modeling and assimilation advances relative to MERRA. Nonetheless, many of the improvements 


oi 


in the MERRA-2 LH and T” are consistent with the changes expected from the use of observed 
precipitation. MERRA-2 and MERRA-Land have smaller positive LH biases and higher LH Ranom 
than MERRA in regions where LH is moisture-limited and thus sensitive to precipitation (south 
Asia and the western US). This is most easily explained by the forcing of the land surface with ob- 
served precipitation in MERRA-2. Additionally, regions where the MERRA-2 JJA 7,2". was most 
sensitive to the precipitation corrections (the Sahel, central US, and parts of south Asia), generally 
experience larger changes in the T2™ Ranom ftom MERRA to MERRA-2. However, the changes 


max 
in Ranom in these areas are not uniformly positive, and in some cases degraded Tt Ranom can be 
traced back to problems in the input precipitation data sets (e.g., over Myanmar). In the future, the 
use of precipitation corrections could be enhanced by also implementing a land data assimilation 
scheme to update the model soil moisture according to observations (e.g., Draper et al. (2011); 
Dharssi et al. (2011); De Lannoy and Reichle (2016)). By making use of remotely sensed obser- 
vations, the land data assimilation would be particularly valuable in regions where the rain-gauge 
network is sparse or has known problems (e.g., in Africa and parts of southeast Asia). 

However, some of the largest biases and lowest Ranom for the MERRA-? turbulent fluxes occur 
where the LH is energy-limited and thus less sensitive to improvements in the precipitation and 
soil moisture. Hence, future efforts to improve the MERRA-? land surface turbulent fluxes would 
best be focused on other facets of the modeling and assimilation. Specifically, future GEOS-5 
development should focus on the overestimated evaporative fraction where LH is energy-limited. 
Additionally, even though the MERRA-?2 R,,¢; is relatively unbiased (compared to CERES-EBAF), 
there are large compensating biases in the individual SW and LW radiation fluxes that are 2-3 times 
the magnitude of the LH biases in terms of the global land annual averages. Reducing the cloud 
bias in the atmospheric model will help these biases, as will re-defining the model 7;4;,, to generate 


a LW,, more consistent with observations. 
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Finally, the SH results for MERRA-Land are troubling. While MERRA-Land did have the 
desired reduction in the LH biases compared to MERRA (to 1 Wm~? in the global land annual 
average), it also had a compensating, and much larger, increase in the SH bias (up to 15 Wm~* 
in the global land average). Additionally, the JIA Ranom compared to MTE were reduced from 
MERRA to MERRA-Land (from a global average of 0.36 to 0.28), despite the LH Ranom being 
increased. The cause of the degraded SH in MERRA-Land is presently unknown, but given the 
otherwise similar MERRA and MERRA-Land land surface models and meteorological forcing, 
an obvious possibility is that the use of observed precipitation in an offline (land-only) replay of 
an analysis, such as MERRA-Land, can lead to inconsistencies in the forcing (e.g., warm and dry 
air, stemming from dry conditions in MERRA, overlying cold ground induced by high antecedent 
rainfall from the observations). Such inconsistencies would not appear in MERRA or (as much) 
in MERRA-?2, given the coupling in the reanalyses of the land surface state with the overlying 
atmosphere. 

While this work focused on evaluating surface energy fluxes in MERRA-?2, the findings have 
relevance to anyone interested in designing a methodology to evaluate global estimates of turbu- 
lent heat fluxes. The gridded LH reference data sets (GLEAM and MTE) had better agreement 
with the reanalyses time series (as measured by Rgnom), and were more useful for evaluating the 
reanalysis output than were the tower observations. In particular they offer (near-) global cover- 
age across several decades, at similarly course resolution to the reanalyses. In the absence of a 
recognized truth for LH (or other similar terms), the recommended evaluation strategy is to com- 
pare the product under evaluation to multiple data sets. However, given the uncertainty in the 
available reference data sets, extra care is necessary to understand the methodology, input data, 
assumptions, and potential dependencies and weaknesses of each reference data set. This process 


relies on expert judgement and inevitably introduces some subjectivity into the interpretation of 


Ee) 


the results. Further development of global gridded LH data sets (including the quality and quantity 
of ground-‘truth’ observations), to increase their confidence would obviously be of great benefit 
to this process. 

The GLEAM and MTE reference data sets used here are independent of each other and are based 
on very different methodologies, thus providing complementary information for use in an evalua- 
tion. However, given the use of the common precipitation input data in GLEAM as in MERRA-?2, 
and the fact that MTE data is not optimized to estimate interannual variability, LH estimates from 
a third reference data set would be useful. Emerging global and multi-decadal land surface flux 
data sets based on an energy balance approach (Anderson et al. 2011), or alternative observational 
frameworks (Alemohammad et al. 2017) would provide useful complements to GLEAM and MTE 


for a more comprehensive analysis. 
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TABLE 1. The reanalyses 


Data set Variables used Output coverage and resolution 
(variable data set citation, where available) 
MERRA-2 1980-ongoing, hourly, 5/8° x 0.5° 
LH,SH, LWhe, SWhet global land (Global Modeling and Assimilation Office 2015b) 
LWa global surface (Global Modeling and Assimilation Office 2015a) 
qT", Tm global surface (Global Modeling and Assimilation Office 2015c) 
MERRA-Land 1980-2016, hourly, 2/3° x 0.5° 
LH, SH, LWyet global land (Global Modeling and Assimilation Office 2008c) 
MERRA 1979-2015, hourly, 2/3° x 0.5° 


ERA-Interim 


LH,SH, LWhrer, SWhet 
LWa 
T2m Tam 


max? *~min 


LH, SH 


global land (Global Modeling and Assimilation Office 2008b) 
global surface (-) 
global surface (Global Modeling and Assimilation Office 2008a) 


1979 - ongoing, monthly mean, 79 km 
global surface 
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960 TABLE 3. Global annual land average energy budget from the NASA reanalyses (Wm~7), estimated over an 


ee: area of 130.2 million km?. 


SWa SW LWa IW, Rnet LH SH 
MERRA-2 204.6 40.7 3126 3855 91.0 47.8 42.2 
MERRA-Land as for MERRA 384.1 95.1 42.5 52.1 
MERRA 206.5 40.9 313.7 386.7 92.6 504 41.2 
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Fic. 1. The global annual mean energy budget over land from the reanalyses (MERRA-2 (M-2); MERRA- 
Land (M-L); MERRA (M)), the literature (NEWS (NEW), Trenberth et al. (2009) (Tre), Wild et al. (2015) (Wil), 
Jiménez et al. (2011) (Jim), Mueller et al. (2011) (Mul), and Mueller et al. (2013) (Mu3)), and the gridded 
reference data sets (MTE, GLEAM (GLM), and CERES (CER)), for a) LH, b) SH, c) SWg, d) SW,, e) LWa, f) 
IW,,, and g) Rye. For NEW, Tre, and Wil, the land mean has been approximated from published continental 
means as described in Section 2.b. Error bars are included where provided, for NEW and Wil these span the 
possible range described by multiple products, and for Jim and Mul these represent one standard deviation 


across multiple products (see citations for full details). 
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between monthly anomalies of LH and rootzone soil moisture (SM) in MERRA-?2 for JJA. No 


12a Value is plotted where the correlation is negative. 
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FIG. 3. JJA sensitivity of the monthly mean 7,”" to precipitation in MERRA-2: R2,,,,,, between the monthly 


ax 


mean 7,2" anomalies, and the two-monthly (current + previous months) precipitation anomalies, for (a) the 


model-generated precipitation (PRECTOT), and (b) the observation-corrected precipitation (PRECTOTCORR), 


anom 


together with their difference (c) AR? 4m = Ronom(PRECTOTCORR, 7”) - R2,,,,(PRECTOT, 72”). Values are 


plotted only where the correlation between 7,2”. and precipitation is negative. 


a, 


a) GLEAM LH [Wm7-? MTE LH [Wm7-?] c) MTE SH [Wm~-? 


(0) 


2) peouibaraee ae 


ele ait - ME ce ashe 


2- GLEAM LH [Wm] e) MERRA-2 - MTE LH We f) 


g) MBRRA-Land = GLEAM LH [Wm-?] h) MERRA-Land - MTE LH [Wm~?] i) MERT A: Laay - MTE SH HAs! 


j) MERRA - GLEAM LH [Wm-?] MERRA - MTE LH [Wm !) MERRA - MTE SH a) 


1030 Fic. 4. The mean JJA turbulent fluxes, with GLEAM LH (column 1), MTE LH (column 2), and MTE SH (column 3) reference data in row 1, and the 
101 difference from the reference data for MERRA-2, MERRA-Land, and MERRA, in rows 2-4. The statistics span 1980-2016 for GLEAM, and 1982-2011 
1o32 for MTE. 
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1036 FIG. 6. The mean JJA radiation terms, from the CERES-EBAF reference data (row 1), and difference from the reference data for MERRA-2 (row 1), 
137 for (columns 1-3) SWher, LW,, and LWy. The statistics span 2000-2015. 
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1041 FIG. 8. Ranom between GLEAM LH, MTE LH, and MTE SH (columns 1-3) for MERRA, MERRA-Land, MERRA-2, and ERA-Interim (rows 1-4), 


io for JJA. Statistics span 1980-2016 for GLEAM and 1982-2011 for MTE. 
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FIG. 9. Bar plot of the mean annual (a) LH and (b) SH, across the 20 Fluxnet site locations, from MERRA-2 


(M-2), MERRA-Land (M-L), MERRA (M), Fluxnet (FIN), MTE, and GLEAM (GLM; LH only), calculated 


using each data set at its native resolution (and screened temporally for Fluxnet availability). For the global data 


sets, circles are plotted for the global land annual mean (taken from Figure 1). 
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Fic. 10. Bar plot of the Rgnom over JIA averaged across the 20 Fluxnet site locations, for (a) LH, (b) SH, 
and (c) LH+SH, between each pair of the reanalyses (MERRA-2 (M-2), MERRA-Land (M-L), MERRA (M), 
and ERA-I (E-J)) and the reference data (Fluxnet (FIN), MTE, and GLEAM (GLM)). The Ranom vs. the Fluxnet 
reference data use the reanalysis output at their reported spatial resolution (and screened temporally for Fluxnet 
availability), while the Ranom VS. GLEAM and MTE use reanalyses and reference data regridded to 1°. For 
GLEAM and MTE, circles are plotted for the global mean JJA Ranom (averaged over subplots of Figure 8). 
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1053 Fic. 11. The (a) MERRA-2 Ranom Vs. CRU monthly mean | lio and (b) the improvement in the 1 exile Ranom 


1s from MERRA to MERRA-2, both over JJA. Statistics span 1980-2015, and white plotted over land indicates 


toss insufficient CRU data. 
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